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This study has examined five issues relative to the use of different OLS regression and 
HLM models in identifying effective schools and teachers. First, OLS regression models 
using first and second order interactions produced results that were very close to those 
produced by two-level HLM models at the school level and two and three-level HLM 
models at the teacher level. Second, most OLS regression and HLM models used in this 
study accounted for more than seventy percent of the variance in student achievement in 
reading and mathematics. Generally, as more information was included in the equations 
more variance was accounted for. Third, the results produced by all of the models were 
extremely consistent. The correlations among results produced by the various models 
were all generally above .90. Fourth, correlations of results with important school, 
teacher, and student level contextual variables were negligible for all models meaning 
that the various models produced results that were free from bias relative to important 
school, student, and classroom level contextual variables. Fifth, correlations of results 
with pre-score characteristics were negligible for all models meaning that the various 
models produced results that were free from bias relative to level of pretest scores. 
Taking all results into consideration, the recommended solution was to implement a two- 
level HLM model (student-school) to determine school effect and then to adjust the 
empirical bayes residuals from that model with an adjustment for shrinkage to form the 
basis for the estimates of teacher effect. This paper concludes with the appropriate 
formulas for accomplishing this. 
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The need for instructional improvement in the Dallas Public Schools, like most urban 
districts, had been thoroughly documented over a period of twenty years. After a period of rapid 
achievement growth in the early and mid 1980's, student achievement in the Dallas schools had 
leveled out. In 1990, responding to this need, the District's Board of Education appointed a 
citizen's task force, the Commission for Educational Excellence, to formulate recommendations 
to accelerate the needed improvement. After a year of community hearings and extensive study, 
the Commission recommended a six point plan for massive educational reform. At the heart of 
the Commission's recommendations was an accountability system that fairly and accurately 
evaluated schools and teachers on their contributions to accelerating student growth in a number 
of important and valued outcomes of schooling. This was coupled with a movement to give 
schools more decision-making authority about personnel, curriculum, and most other aspects of 
schooling. In exchange for this authority, school staffs were to be held accountable for their 
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actions. As part of this recommendation, $2.4 million was set aside as an incentive award to 
reward effective schools and their professional and support staffs. 

It then became the task of the District's Research, Planning, and Evaluation Department 
to develop, pilot test, and implement an evaluation system to accomplish the goals of the 
Commission. The first step in accomplishing this task was the appointment of an Accountability 
Task Force to oversee the process. This task force, consisting of teachers, principals, parents, 
members of the business community, and central office administrators, was charged with the 
responsibility of advising the General Superintendent concerning the implementation of a 
performance incentive plan, working with the administration to ensure the validity of the 
selection procedure and subsequent results of the incentive plan, and serving as a review 
committee to examine any issues raised by personnel concerning questions of equity and fairness 
of the procedures. During a year of exhaustive deliberations, a number of requirements for the 
methodology associated with this plan were developed. Among these were: 

1 . It must be value-added. 

2. It must include multiple outcome variables. 2 

3. Schools must only be held accountable for students who have been exposed to their 
instructional program (continuously enrolled students). 

4. It must be fair. Schools must derive no particular advantage by starting with high- 
scoring or low-scoring students, minority or white students, high or low socioeconomic 
level students, or limited English proficient or non-limited English proficient students. 
In addition such factors as student mobility, school overcrowding, and staffing patterns 
over which the schools have no control must be taken into consideration. 

5. It must be based on cohorts of students, not cross-sectional data. 

Within the five aforementioned parameters, a number of statistical models are possible. 
This study examines alternative methodologies for determining school effect then extends these 
studies to the determination of teacher effect. These models are designed to isolate the effect of a 



2 Performance indicators for 1995-96 include Iowa Texas of Basic Skills and Test of Achievement and Proficiency 
reading and mathematics, grades 1-9, Spanish Assessment of Basic Education, grades 1-6; Texas Assessment of 
Academic Skills reading and mathematics, grades 3-8 and 10; writing, grades 4,8, and 10; science and social studies, 
grade 4 and 8; Texas Assessment of Academic Skills, Spanish version grades 3 and 4; 72 standardized final 
examinations in language, mathematics, social studies, science, GSOL, reading, and world languages, grades 9-12; 
promotion rate, grades 1-8; student attendance, grades 1-12; graduation rate, grades 9-12; Scholastic Aptitude Test 
percent tested and scores, grades 9-12; dropout rate, grades 7-12; student enrollment in prehonors/honors courses, 
grades 7-12; student enrollment in advanced diploma plans, grades 9-12; students enrolled in advanced placement 
courses, grades 11-12; Preliminary Scholastic Aptitude Test percent tested and scores, grades 9-12; and percent 
passing Advance Placement Exams, grades 1 1-12. The system is run with only continuously enrolled students and 
includes staff attendance incentive, minimum percent eligible tested requirements, and requirements that at least 
one-half of a school’s cohorts must outgrow the national norm group on the ITBS and TAP in reading and 
mathematics. 
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school’s or teacher’s practices on important student outcomes. The school effect can be 
conceptualized as the difference between a given student’s performance in a particular school 
and the performance that would have been expected if that student had attended a school with 
similar context but with practice of average effectiveness. The teacher effect can be 
conceptualized similarly at the teacher level. 

Background 

Interest in performance-based or outcomes-based teacher evaluation dates all the way 
back to fifteenth-century Italy where a teacher master's salary was dependent upon his or her 
students' performance. Despite long-term interest, progress in actually linking student outcomes 
to school and teacher performance has been very limited. 

State Departments of Education have taken a leadership role in attempting to accomplish 
this at the school and district level. Forty-six of fifty states have accountability systems that 
feature some type of assessment. Twenty-seven of these systems feature reports at the school, 
district, and state level; three feature school level reports only; six feature reports at both the 
school and district level; seven feature reports at the district and state level; two feature reports at 
the state level only; and one is currently under development (Council of Chief State School 
Officers, 1995). When one reviews these systems, it is obvious that their designers are not 
familiar with the literature on value-added systems since only two states. South Carolina (May, 
1990) and Tennessee (Sanders and Horn, 1995) have used appropriate value-added statistical 
methodology in implementing such systems. Most of the rest tend to evaluate students, not 
schools or districts, and generally cause more harm than good with systematic misinformation 
about the contributions of schools and districts to student academic accomplishments. By 
comparing schools on the basis of unadjusted student outcomes, state reports are often 
systematically biased against schools with population demographics that differ from the norm, a 
fact that was graphically illustrated by Jaeger (1992). In attempting to eliminate this bias, a 
number of states have gone to non-statistical grouping techniques, an approach that has serious 
limitations when there is consistent one-directional variance on the grouping characteristics 
within groups. 

Investigators throughout the world have conducted and reported numerous studies aimed 
at identifying effective schools as well as estimating the magnitude and stability of school 
contributions to student outcomes. Good and Brophy (1986) provide an excellent review of this 
work. Researchers have been working for a number of years on appropriate methodology for 
adjusting for the effects of student and school demographic variables in estimating school effects. 
One approach has been to regress school mean outcome measures on school means of one or 
more background variables. This approach is only adequate to the extent that there is not much 
within school variance, that is, the school impacts all students similarly. Mendro and Webster, 
(1993) demonstrated that this is not the case and that using school level models to attempt to 
estimate school effects, while better than the common practice of reporting unadjusted test 
scores, produces extremely unstable estimates of school effects. 



Another approach, one that has received generally widespread acceptance among 
educational researchers, involves the aggregation of residuals from student-level regression 
models (Aiken and West, 1991; Bano, 1985; Felter and Carlson, 1985; Kirst, 1986; Klitgard and 
Hall, 1973; McKenzie, 1983; Millman,1981; Saka, 1984; Webster and Olson, 1988; Webster, 
Mendro, and Almaguer, 1994). These techniques can incorporate a large number of input, 
process, and outcome variables into an equation and determine the average deviation from the 
predicted student outcome values for each school. Schools are then ranked on the average 
deviation. Some advantages of multiple regression analysis over other statistical techniques for 
this application include its relative simplicity of application and interpretation, its robustness, and 
the fact that general methods of structuring complex regression equations to include 
combinations of categorical and continuous variables and their interactions are relatively 
straightforward (Aiken and West, 1991; Cohen, 1968; Cohen and Cohen, 1983; Darlington, 
1990). 



Finally, hierarchical linear modeling (HLM) provides estimates of linear equations that 
explain outcomes for group members as a function of the characteristics of the group as well as 
the characteristics of the members. Because HLM involves the prediction of outcomes of 
members who are nested within groups which in turn may be nested in larger groups, the 
technique should be well suited for use in education. The nested structure of students within 
classrooms and classrooms within schools produces a different variance at each level for factors 
measured at that level. Bryk, et. al. (1988) cited four advantages of HLM over regular linear 
models. First, it can explain student achievement and growth as a function of school-level or 
classroom-level characteristics while taking into account the variance of student outcomes within 
schools. Second, it can model the effects of student characteristics, such as gender, race- 
ethnicity, or socioeconomic status (SES), on achievement within schools or classrooms, and then 
explain differences in these effects between schools or classrooms using school or classroom 
characteristics. Third, it can model the between and within-school variance at the same time, and 
thus produce more accurate estimates of student outcomes. Finally, it can produce better 
estimates of the predictors of student outcomes within schools and classrooms, by “borrowing” 
information about these relationships from other schools and classrooms. HLM models are 
discussed in the literature under a number of different names by different authors from a number 
of diverse disciplines (Bryk and Raudenbush, 1992; Dempster, Rubin and Tsutakawa, 1981; 
Elston and Grizzle, 1962; Goldstein, 1987; Henderson, 1984; Laird and Ware, 1982; Longford, 
1987; Mason, Wong, and Entwistle, 1984; Rosenberg, 1973). 

Extending this methodology to the teacher level becomes more complex. The issue really 
is not one of whether or not student achievement data should be used in teacher evaluation, but 
rather entails a methodological debate over ways to operationalize and implement such a system. 
Unfortunately, the preponderance of literature in the field concentrates upon reasons student 
achievement data cannot be used for teacher evaluation rather than upon credible ways to use it. 
Some of the concerns raised in the literature include: 

• the development of procedures to account for the difficulty in measuring the long-term 
development of skills which may not be measured in year-to-year growth patterns 
(TEA, 1988). 
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the assessment of diverse areas of achievement which do not have readily available 
standardized tests is an area of concern when dealing with non-academic area teachers. 

programs which pull out students for remediation, programs which involve team- 
teaching, and programs with extensive use of instructional aides inhibit the estimation 
of an individual teacher's contribution to improved student achievement. 

norm-referenced standardized tests sample broad subject domains and are unlikely to 
match closely the curriculum in particular classrooms at particular times (Haertel, 
1986). 

well-established, broadly applicable, and accepted achievement measures are not 
available in all the relevant areas of learning (Bano, 1 985). 

standardized achievement tests are unlikely to reflect the full range of instructional 
goals in their subject areas. Norm-referenced tests tend to ignore the higher-order 
skills. Therefore it is likely that products of superior teaching are not measured 
adequately or completely by standardized achievement tests (Bano, 1985). 

what the student brings to the classroom in terms of ability, home and peer influence, 
motivation and other influences is very powerful in affecting academic achievement at 
the end of the year (Iwanicki, 1986). 

the statistical methods used to control for non-teacher factors cannot take into account 
all of the relevant factors. More importantly, the methods will be incomprehensible to 
those being evaluated and difficult to defend in public (Bano, 1985). 

non-statistical models for controlling non-teacher factors are easier to explain, but 
cannot take into account most of the necessary circumstances (Bano, 1985). 

attempting to use any one of a number of regression-based techniques at the teacher 
level creates a rather subtle problem related to the statistical concept of "degrees of 
freedom." In general, the number of degrees of freedom upon which a statistical 
procedure is based depends on the sample size (N) and the number of sample statistics 
(i.e., variables in multiple regression). The sample size (i.e., number of students) for a 
teacher is relatively small to start with. However, the usable sample size becomes even 
smaller because development of the regression equations requires existing test scores 
for each student for at least two successive years. As an example, a second-grade 
teacher may have a class of 22 students, but may only have test scores from the first 
grade for 1 1 of those students. Since degrees of freedom also depends on the number of 
variables in the multiple regression equation, a regression equation with four (4) 
variables would leave just seven (7) degrees of freedom. The stability of a projected 
regression line is primarily dependent on the number of degrees of freedom. Seven is 
generally not enough for stable estimates. As a general rule of thumb, thirty students 



per variable has been recommended as a minimum number upon which to base a 
projected regression line. 

Nontechnical concerns most often found in the literature include the concern that 
objectives that are not measured by the tests will be omitted by teachers, that other duties such as 
playground supervision and school committee work may be slighted, and that, with each teacher 
being rated separately, the collegiality necessary to building good instructional teams within a 
building may be damaged. 

Most of the methodological issues raised above can be resolved.(l) Longitudinal growth 
curves, or alternatively, relationships based upon two or more years of data, can be formulated on 
important outcome variables. In the case of relationships based upon two years of data, 
replication is necessary to assure greater reliability. (2) Criterion-referenced tests can be 
developed and used to assess diverse areas of achievement. (3) In cases where there are pull-out 
or send-in programs, team teaching, or instructional aides, data can be provided at the team level 
rather than at the individual teacher level. (4) Measures in addition to norm-referenced tests can 
be used. (5) Constituents are primarily interested in basic skills. To the extent that measures are 
needed in music, art, physical education, etc., they can be developed. (6) Criterion-referenced 
tests can be used to measure higher-order thinking skills. In addition, performance testing can be 
used as one outcome variable with the outcomes being weighted by the reliability of the 
instruments. (7) What the student brings to the classroom in terms of background variables can 
be statistically controlled. These variables typically account for 9-20% of the variance in student 
achievement (Webster, Mendro, and Almaguer, 1993). (8) It has been the authors’ experience 
that gender, ethnicity, limited English proficiency status, and free-or-reduced-lunch status, plus 
their interactions, account for most of the variance that can be attributed to background variables. 
They are easy to explain and defend. (9) Non-statistical models for controlling non-teacher 
factors are misleading and should not be used (Webster and Edwards, 1993). (10) The degrees of 
freedom problem is real in that one must worry about the stability of the regression line when it 
is applied to one teacher. At the teacher level, replication over several years is the best safeguard 
against erring because of small sample size. 

Previous studies conducted in the Dallas Public Schools have demonstrated that equations 
using school means produce spurious results because they do not take into account the within 
school variance (Mendro and Webster, 1993); that analysis of unadjusted gain scores produced 
different results than those produced by regression and HLM Models (r=. 73 to .80); that the 
results produced by gain scores analysis were systematically biased against schools with higher 
than average Black and poor student populations and in favor of schools with higher than 
average White, economically advantaged, and Hispanic populations (Webster, et. al., 1995); that 
reporting of absolute test scores without any additional analysis produced results that were 
systematically biased against schools with higher than average percent of minority, poor, and 
Black student populations and in favor of schools with higher than average white and 
economically advantaged populations and that were very different form those produced by HLM 
and regression analysis (r=.34 to ,60)(Webster, et.al, 1995); and that longitudinal HLM and 
regression analyses using two years of individual student data for prediction without taking into 
consideration contextual variables produced results that were somewhat consistent with the HLM 
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and regression models to be discussed in this paper (r=.89 to .93) but that were systematically 
biased against schools with higher than average minority. Black, and poor populations and 
systematically biased in favor of schools with higher than average White, Hispanic, and 
economically advantaged populations (Mendro, et. al., 1994; Webster and Olson, 1988). 

This paper examines the applicability of selected HLM and regression models to the 
identification of school and teacher effect. 



Methodology 

Sample 

The sample used in this study consisted of all students in the Dallas Public Schools who 
were in grade 3 in 1994 and grade 4 in 1995 and who had complete data in reading and 
mathematics. These represent longitudinal cohorts. The temptation to use simulated data was 
great, however, one of the major purposes of this study was to determine if the HLM routines 
would execute on real large-scale data sets. 



Instrumentation 

The instrumentation used for the study was the Iowa Tests of Basic Skills Reading and 
Total Mathematics subtests. Raw scores were the unit of analysis. 



Purpose 

Five major issues were investigated in this study. They were: 

1 . What is the correlation among the results produced by the various models for 
predicting ( 1 ) school effect and (2) teacher effect? 

2. How much variance is accounted for by each of the alternate models? 

3. How consistent are the results? 

4. How do the results produced by the various models correlate with individual 
student and aggregate school demographic variables? (ethnicity, 
socioeconomic status, English proficiency) 

5. How do the results produced by the various models correlate with individual 
student and aggregate school pre-score characteristics? 



Analysis 



The analysis consisted of a series of regression and hierarchical linear models. All 
analyses, except where specified, were completed on residuals that were obtained from solving a 
series of student level equations designed to account for the effects of ethnicity, limited English 
proficient status, gender, socioeconomic status, and their first and second order interactions. 
Equations were developed for both predictor and criterion variables. The unit of analysis for the 
second-level regression and HLM equations was the residuals obtained from the aforementioned 
first-level equations. 

Specifications for the equations follow. 



School Effects 

At the school level eleven different models were tried. All models included two stages. 
The first stage, or fairness stage, was designed to take the effects of important contextual 
variables out of the subsequent second stage equations for both predictor and criterion variables. 
Variables used in the first regression and prediction stage included: 



Y .iJ 

X Hj 

X 2ij 

X 3U 

X 4i j 

X 5i j 

X 6i j 

X 7ij 

X 8ij 

X 9ij 

Xioy 

X kij 



- Outcome variable of interest for each student i in school j. I is a measure for 
grade/subj ect/y ear. 

= Black English Proficient Status (1 if black, 0 otherwise). 

= Hispanic English Proficient Status (1 if Hispanic, 0 otherwise). 

= Limited English Proficient Status (1 if LEP, 0 other). 

= Gender (1 if male, 0 if female). 

= Free or Reduced Lunch Status (1 if subsidized, 0 otherwise). 

- School Mobility Rate (same for all / in each j). 

- School Overcrowdedness (same for all / in each j). 

= Block Average Family Income 

= Block Average Family Education Level 

= Block Average Family Poverty Level 

= indicates the variable k for 1 th student in school j for i = 1,2, ..., 7, and j — 1,2, 



The model was 



and 



Y, = XA, + e, , e, ~MVN(0 ,Ict 2 ) 




'vt'94 '\. 7 f 95 

Y| , Y| = Student’s scores in 93/94 and 94/95 respectively, for math and reading. 



Variables used in the second or prediction stage included: 
Student Level Variables: 
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Posttest Residual Score from fairness stage for measure / for / student in 
school j. In this paper it represents ITBS Reading 1995 or ITBS Mathematics 
1995. 




h th predictor used to estimate r? 5 for / dl student in school j. This is a Pretest 

Residual score from the fairness stage. In this paper ir represents ITBS 
Reading 1994 and ITBS Mathematics 1994. 



r 'ij “ Y,jj - Y,jj from OLE 



School Level Variables: 

W|j = School Mobility 

W 2 j = School Overcrowdedness 

W 3 j = School Average Family Income 

W 4 j = School Average Family Education 

W 5 j = School Average Family Poverty Index 

W 6 j = School Percentage on Free or Reduced Lunch 

W 7 j = School Percentage Minority 

W 8 j = School Percentage Black 

W 9 j = School Percentage Hispanic 

W 10 j = School Percentage Limited English Proficient 

Because of limitations in the number of variables that can be used in HLM due to the 
small n’s in some schools, and the high correlations among the background variables, the 
methodology described in the first stage (fairness stage) was utilized so that HLM was actually 
run on residuals. The first step in using HLM involves centering the data. The data may be 
centered around a grand mean or around individual school means. In this study data were 
centered around the grand mean. If data are not centered, severe problems with multicolinearity 
are encountered and the HLM program cannot invert many of the matrices associated with 
individual schools. 



School level models included: 



PALLAS-FULL (1.0) 



Stage h 1 

Yjj = Ao + A | X | j j + A 2 X 2 jj + A 3 X 3 jj + A 4 X 4i j + A 5 X5jj + A 6 X 6i j + A 7 X 7 jj + A g X gi j + A 9 X 9i j + 
AioXioij + A||(X|jjX 4 ij) + A| 2 (X 2i jX 4 jj) + A, 3 (X 3i jX 4 jj) + A| 4 (X|jjX 5i j) + A| 5 (X 2 jjX 5 jj) + 
Ai 6 (X 3i jX 5i j) + A, 7 (X 4 ijX 5 ij) + A 1 8 (X , jjX 4 jjX 5 jj) + A,9(X 2 jjX 4 ijX 5 ij) + A 20 (X 3i jX 4 jjX 5i j) + 

where ~ i.i.d. ~ N(0 ,cj 2 ). 

Stage 2; 

if =Po + P,r,^+P 2 r 2 9 J + 8 ij 

where Sjj ~ N(0 ,cj 2 ). 

HLM-FULL (2.0) 

Stage 1 ; 

Yjj = A 0 + A|X| jj + A 2 X 2i j + A 3 X 3i j + A 4 X 4i j + A 5 X 5i j + A 6 X 6i j + A 7 X 7i j + A 8 X gj j + A 9 X 9i j + 
A i oX j oij + An(X|jjX 4i j) + A| 2 (X 2i jX 4i j) + A 13 (X 3i jX 4i j) + A, 4 (X| jjX 5i j) + A| 5 (X 2i jX 5 ij) + 
Ai6(X 3 ijX 5 ij) + A 17 (X 4i jX 5i j) + A lg (XjijX 4 jjX 5 ij) + A| 9 (X 2 ijX 4i jX 5 ij) + A 20 (X 3i jX 4 ijX 5 ij) + 

2 

where Ey ~ i.i.d. ~ N(0,a ). 

Stage 2; 

r f -Po + P.rJ + P 2 $+8 U 

and 

Pkj = YkO + u kj 

for i = 1, 2, ..., Ij 
j = l,2, ...,J 
k = 0, 1,2. 

where E(8jj) = 0, Var(8jj) = a 2 , E(u kj ) = 0, Var(u kj ) = ct 2 , and 8ijlu kj . 



3 In all regression procedures for both stage 1 or stage 2, arrays were standardized to assure that schools with 
unusual numbers of students in certain areas of predictor space were not rated based upon differential variance in 
different arrays. 



DALLAS-MC (3.0) 



Stage 1 : 

Yjj = A 0 + A,X, jj + A 2 X 2 jj + A 3 X 3 jj + A 4 X 4i j + A 5 X 5i j + A 8 X 8i j + A 9 X 9i j + A| 0 X l0i j + Ai i(X li jX 4 i j) 
+ A, 2 (X 2 i jX 4 jj) + A, 3 (X 3 jjX 4 i j) + A 14 (X|jjX 5 i j) + A 15 (X 2 i jX 5 ij) + A| 6 (X 3 i jX 5 i j) + A| 7 (X 4 i jX 5 i j) 
+ A] 8 (Xj jjX 4 jjX 5 jj) + A j 9 (X 2 jjX 4 jjX 5 jj) + A 20 (X 3 ijX 4 ijX 5 i j) + 

where ~ i.i.d. ~ N(0,cr 2 ). 

Staged; 

f = Po + Pi r uj 4 + P 2 r 2 9 ij + Sjj 
where 5jj ~ N(0,a 2 ). 



HLM-MC (4.0) 

Stage 1; 

Yjj = A 0 + AjXjjj + A 2 X 2i j + A 3 X 3i j + A 4 X 4i j + A 5 X 5i j + A 8 X 8i j + A 9 X 9i j + A 10 X 10i j + A 11 (X 1 jjX 4 i j) 
+ A 12 (X 2 i jX 4 i j) + A 13 (X 3 i jX 4 i j) + A 14 (X n jX 5 i j) + A 15 (X 2 ijX 5 ij) + A 16 (X 3 ijX 5 i j) + A I 7 (X 4 i jX 5 i j) 
+ A] 8 (X li jX 4 i jX 5 i j) + A, 9 (X 2 i jX 4 ijX 5 i j) + A 20 (X 3 i jX 4 i jX 5 ij) + 

2 

where ~ i.i.d. ~ N(0,a ). 

Stage 2; 



f »Po + P.r^+P 2 r ^+8* 



Pkj ~ YkO + YkO + Ykl Wy + Yk2^ 2 j + U k j 

for i= 1,2, ..., Ij 
j = 1, 2, ..., J 
k = 0, 1,2, 

where E(5jj) = 0, Var(5ij) = a 2 , E(u kJ ) = 0, Var(u kj ) = a 2 , and 5jj±u kj . 



PALLAS-MCL (5.0) 
Stage 1 : 



Yjj = A 0 + A|X|y + A 2 X 2ij + A 3 X 3ij + A 4 X 4ij + A g X gij + A 9 X 9ij + A 10 X 10ij + A, ,(X nj X 4ij ) + 
Ai2CX 2 ijX 4i j) + A, 3 (X 3i jX 4i j) + 8jj 

where Ejj ~ i.i.d. ~ N(0,<r 2 ). 

Slags2; 

if = P 0 + Pi r f + P 2 r 2 9 y + Sjj 

where 5jj ~ N(0,cr 2 ). 



HLM-MCL (6.0) 

Stage 1 ; 

Yij — Ao + A i X | jj + A 2 X 2i j + A 3 X 3 jj + A 4 X 4i j + A g X g y + A 9 X 9 y + AioXjoy + A ( i(X| jjX 4 y) + 
Ai2(X 2j jX 4i j) + A 13 (X 3i jX 4i j) + Sjj 

where Ey ~ i.i.d. ~ N(0,cr 2 ). 

Stage 2: 



rf = Po + Pi r,^ 1 + p 2 r 2 ’j + 8,; 

Pkj = YkO + YkO + Ykl W lj + Yk2 W 2j + Yk6 W 6j + U kJ 

fori = 1,2, ..., Ij 
j=l,2, ...,J 
k = 0, 1,2, 

where E(6jj) = 0, Var(6ij) = E(u kj ) = 0, Var(u kj ) = a 2 , and 5jjJ.u kj . 

HLM-MCC01 (7.0) 

Stage 1 : 

Yy = A 0 + A,X,ij + A 2 X 2ij + A 3 X 3ij + A 4 X 4iJ + A s X sij + A u (X nj X 4ij ) + A 12 (X 2ij X 4ij ) + 
A| 3 (X 3i jX 4i j) + A| 4 (X n jXsij) + A|s(X 2i jX 5 ij) + A 16 (X 3i jX 5 jj) + A 17 (X 4i jX 5 jj) + 

Aig(X li jX 4 jjX S ij) + A| 9 (X 2 jjX 4 ijX 5i j) + A 20 (X 3i jX 4 jjX 5 ij) + e ij 



where £y ~ i.i.d. ~ N(0,a 2 ). 

stage 2; 



if = Po + Pi r u, 4 + P2 r 2 ij + Sjj 

Pkj = Yko + YkO + Yki^ij + Y k2 W 2 j + Yk3^ 3 j + Y k 4 W 4 j + YksW 5 j + u kj 

for i = 1, 2, Ij 
j = l,2, ...,J 
k = 0, 1,2, 

where E(8y) = 0, Var(8jj) = a 2 , E(u kj ) = 0, Var(u kj ) = a 2 , and 6jj_Lu k j. 

HLM-MCC02 (8.01 

Stage 1 ; 

Yy = A 0 + AiX^j + A 2 X 2 jj + A 3 X 3i j + A 4 X 4 jj + A 5 X 5i j + A, ,(Xj jjX 4i j) + A 12 (X 2i jX 4i j) + 
Ai 3 (X 3i jX 4i j) + A 14 (X li jX 5i j) + A 15 (X 2i jX 5i j) + A 16 (X 3i jX 5i j) + A 17 (X 4 yX 5 y) + 
Ai8(XnjX 4i jX 5i j) + A | 9 (X 2i jX 4 jjX 5 jj) + A 20 (X 3 jjX 4i jX 5i j) + Ey 

where Ey ~ i.i.d. ~ N(0,cr 2 ). 

Stage 2; 



if =Po + P,r^+p 2 r 2 9 J+8y 

Pkj = Yko + Yko + Yki W,j + Yk 2 W 2j + y k3 W 3j + y k4 W 4j + y k5 W 5j + y k6 W 6J + u kj 

for i = 1,2, ..., Ij 
j = l,2, ...,J 
k = 0, 1,2, 

where E(8y) = 0, Var(8y) = ct 2 , E(u k j) = 0, Var(u k j) = ct 2 , and 8y_Lu k j. 
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HLM-MCC03 f9.01 



Stage 1 : 

Yjj = A 0 + A|X n j + A 2 X 2 ij + A 3 X 3i j + A 4 X 4i j + A 5 X 5i j + A, ,(X li jX 4 i j) + Ai 2 (X 2 i jX 4 i j) + 
Ai3(X 3 jjX 4 j j) + A 14 (X n jX 5 i j) + Ai 5 (X 2 i jX 5 i j) + Ai 6 (X 3 ijX 5 i j) + A.i 7 (X 4 yX 5 y) + 

Aj g(Xj ijX 4 i jX 5 jj) + A 19 (X 2 jjX 4 jjX 5 i j) + A 20 (X 3 ijX 4 i jX 5 i j) + Ey 

where 8y ~ i.i.d. ~ N(0,cr 2 ). 

Slag ? 2 ; 



$ “ Po + P, r hj 4 + P2$+8„ 



Pkj - YkO + YkO + Ykl W lj + Yk2 W 2j + Yk3 W 3j + Yk4 W 4j + Yk5 W 5j + Yk6 W 6j + Yk7 W 7j + U kj 

fori = 1,2, ...,Ij 
j-1,2, ...,J 
k = 0, 1,2, 

where E(5y) = 0, Var(5y) = a 2 , E(u k j) = 0, Var(u k j) = a 2 , and 5y_Lu k j. 

HLM-MCC04 tlO.Ol 

Stage 1 ; 

Yjj = A 0 + A | X | y + A 2 X 2i j + A 3 X 3 y + A 4 X 4 y + A 5 X 5 y + A! 1 (X 1 yX 4 y) + A| 2 (X 2 jjX 4 jj) + 
Al3(X 3 yX 4 jj) + A 14 (X n jX S y) + A 15 (X 2 yX5y) + A 16 (X 3 yX 5 y) + A 17 (X 4 yX 5 y) + 
A, 8 (X,yX 4 yX 5 y) + A 1 9(X 2 yX 4 yX 5 y) + A 20 (X 3 jjX 4 jjX 5 y) + £y 

where 8y ~ i.i.d. ~ N(0,cr 2 ). 

Stage 2: 



rf = Po + Pi r uj 4 + P 2 r 2 y + 5y 



Pkj - YkO + YkO + Ykl Wlj + Yk2 W 2j + Yk3 W 3j + Yk4 W 4j + Yk5 W 5j + Yk6 W 6j + Yk7 W 7j + Yk8 W 8j + Yk9 W 9j + 



fori = 1,2, ...,Ij 
j = 1,2, ...,J 
k = 0, 1,2, 

where E(6y) = 0, Var(6y) = a 2 , E(u kJ ) = 0, Var(u k j) = a 2 , and 6y_Lu k j. 
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HLM-MCC05 (11.01 



Stage 1 ; 

Yy = A 0 + AjX|jj + A 2 X 2 jj + A 3 X3jj + A 4 X 4 ij + A 5 X 5i j + A 11 (X lj jX 4j j) + A 12 (X 2i jX 4j j) + 
^13(^3ij^4ij) + ^14(XiijX 5i j) + A 15 (X 2 jjX 5 jj) + A 16 (X3jjX 5j j) + A 17 (X 4i jX 5 jj) + 

A.^X.yX^Xsij) + A 19 (X 2 ijX 4 ijX 5 ij) + A 20 (X3ijX 4i jX 5i j) + Sjj 

where ~ i.i.d. ~ N(0,cr ). 

Stage 2: 

r y 5 «Po + P.r£+kr*+6 B 

Pkj = YkO + YkO + Ykl W ij + Yk2^ 2 j + Yk3^ 3 j + Yk4^ 4 j + Yk5^5j + Yk6^6j + Yk7^ 7 j + Yk^gj + Yk9^ 9 j + 

YkloWlOj + U k j 

for i= 1, 2, Ij 
j - 1 » 2, ...,J 
k = 0, 1,2, 

where E(8jj) = 0, Var(8jj) = a 2 , E(u k j) = 0, Var(u k j) = c 2 , and 8jj±u k j. 

Figure 1 summarizes the various school level models. 

Teacher Effects 4 



In attempting to attribute teacher effects seven different models were examined. The 
question of interest involves the complexity of equations that one must implement in order to 
produce reliable and valid results. If one could limit the equations to a two-level HLM with the 
second level being the school level with adjustment for shrinkage to estimate teacher effects one 
would be able to quality control the system better than if one had to have a different equation or 
equations for each teacher. With parsimony in mind, the following equations were examined for 
efficiency in attributing teacher effect. 



4 Teacher Effectivness Indices are used as part of the needs assessment in the teacher evaluation system. 
Teachers are not evaluated based on effectiveness indices. 
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At this point another reference index to student test scores and residuals from stage 1 was 
added. To wit: 



itj 



/ student with t th teacher in school j, 
for i= 1,2, ..., I tj 
for t= 1,2, ..., Tj 
for j = 1,2, ..., J. 



This new index does not affect the model specifications in the previous equations. The student 
residuals from stage 1 were calculated as r jtJ — Y it j — Y it j , for one 95 and two 94 student test 

scores, and were used at stage 2 to obtain the predicted score, Fj 95 for r^ 95 . 



‘■tj 



The residual from stage 2 for i student for t m teacher from j m school was 



th 



•th 



s . . = r 95 - ? 95 

a itj ‘itj ‘itj • 



S it j was used to calculate the TEIs, 

I t j is the number of students for teacher i in school j. 

PALLAS FULL & HLM-MCC05 (1.0 and 11.01 

The residuals S jtJ were aggregated with respect to teachers as follows: 

h 

g . X ^itj 

_ S *tJ _ i=l 

tj I I 

A tj X tj 



The TEI for teacher t in school j = s t j . 
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PALLAS F32LL^JHLM-MCC05 with Shrinkage adjustment.U.O and 11.0) 



Method A: 

Each student residual was treated as an outcome predicting a teacher’s performance. 
Hence each teacher has as many performance indicators as students she/he taught. A student 
may have counted twice or more for each course a teacher teaches. 

Let 

j t j h 

Z ZZs it j 

j=l t=l i=l 

^ =_ r^ — 

Z Zi, 

j=l t=l 



J T i '.j 

Z ZZ(s itj -n) 



2 j=l t=l i=l 

a = — 



J t j 

ZZij 

j=u=i 



To calculate the Best Linear Unbiased Prediction (B.L.U.P.) of s it j for the t lh teacher in school j, 



Let 



hj 



i 



Is nj 

i=l 



Z(s it j-s t j ) 2 






I, 



is the error variance for TEI for teacher t in school j, 
then 






TEI ti = (i + (s ti - |J.) 



CT 2 ^ 

a 2 + — 

V ^tj J 
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v tj 2 

If ” is large relative to c , the TEI’s is biased towards the population mean p. 



^ tj 2 

If ~ is small relative to a , the TEI’s is biased towards the sample mean s t j since it is 



more stable. 



Method B: 

Each teacher teaches many classes and many courses. The mean student residuals for 
each course within each class were treated as an outcome predicting a teacher’s performance. 
Thus if a teacher taught two classes, 2A and 2B, the subjects of ITBS MATH and ITBS 
READING, he/she had 4 performance indicators. 



Each s it j was aggregated by class/course and s t j for course/section for teacher t in school j 
was obtained. There were k t j many of these. 



Let 






z 



j=l 




t=l k=l 



J T J 

I ZK t j 

j=i t=i 



J ^ K,j 

Z ZZ(St5-|Ty 



^.2 _ j=l t=l k=l 



J T i 

IlK tj 

j=it=i 



To calculate the B.L.U.P. of s t j for the t lh teacher in school j. 



tj 

is the error variance for TEI for teacher t in school j, 
then 



TEI ti = ft + (s tj - ft) 









_ . 2 \ 
a 2 + — 

V. 



TWO LEVEL HLM with TEACHER as 2nd LEVEL (X2.0 and 12.1) 

Y,jj = Outcome variable of interest for j th student from teacher j. I is a measure of 
grade/ subj ect/y ear . 

X n j, X 2i j, Xioij are the fairness variables for stage 1. 

The teacher level variables are : 

T|y = Classroom Percent Mobility 

T 2ij = Classroom Percent Overcrowdedness 

T 3i j = Classroom Average Family Income 

T 4i j = Classroom Average Family Education 

T 5i j = Classroom Average Family Poverty Index 

T 6i j = Classroom Percentage on Free or Reduced Lunch 

T 7i j = Classroom Percentage Minority 

T 8i j = Classroom Percentage Black 

T 9ij = Classroom Percentage Hispanic 

T lojj = Classroom Percentage Limited English Proficient 

T kii = indicates the variable k for / th student in classroom j for / = 1,2, ..., / and 



HLM-T (12.0) 



Stage 1 : 

Yjj ~ A() A 2 X 2 y ^3^3ij ^4^4ij "*" ^5^5ij A) |(X| jjX^j) + A| 2 (X 2j jX4jj) + 

A|3(X3jjX4y) + A|4(X|ijX 5 ij) + A, 5 (X 2 ijX 5 ij) + A| 6 (X3jjX 5i j) + A 1 7 (X4jjX5jj) + 

Aj 8(X| jjX 4 jjX 5 jj) + A|9(X 2 jjX4ijX 5 ij) + A 20 (X3ijX 4 jjX5ij) + Ey 

where £y ~ i.i.d. ~ N(0 ,ct 2 ). 

S tags 2 ; 

rf = Poj + Plj r Hj 4 +P2j4J + 8y 

and 

Pkj = Yko + YkiT ,j + Yk 2 T 2j + Yk3T 3j + Yk4T 4j + YksT 5 j + Yk6T 6j + Yk 7 T 7j + YkgTgj + Yk-Jgj + u kj 

for i= 1,2, ..., Ij 
j = l,2, ...,J 
k = 0, 1,2, 

where E(5jj) = 0, Var(8ij) = a 2 , E(u kj ) = 0, Var(u kj ) = x 3 * 3 , and 8jj±u kj . 

The TEI for teacher j was obtained from the empirical bayes estimate for u 0 j . 

A teacher may have had many TEI’s from different subject/courses/classes. These TEI’s could 
be combined directly, or weighted by n or combined with a shrinkage adjustment. 

HLM-TC (12.lt 

Stage 1 : 

NONE 
Stage 2: 

Yf = Poj + Pij r nj 4 + P 2 j r 2 9 jj + Sy 

and 

Pkj = YkO + Yk|T|j + Yk 2 T 2 j + Yk3^3j + Yk4^4j + YkS^sj + Yk6^6j + Yk7^ 7 j + Yk8^8j + Yk9^9j + U k j 
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for i = 1,2, Ij 
j = 1,2, ..., J 
k = 0,1,2, 

where E(5jj) = 0, Var(5jj) = ct 2 , E(u kj ) = 0, Var(u kj ) = t 3X 3 , and 5jjlu kj . 

A TEI for teacher j was obtained from the empirical bayes estimate for u 0 j . 

A teacher may have had many TEI’s from different subject/courses/classes. These TEI’s could 
be combined directly, or weighted by n or combined with a shrinkage adjustment. 



THREE LEVEL HLM MODEL FOR STUDENTATEACHER/SCHOOL (13.0 and 13.11 

The teacher level variables for school k are : 

Tijj k = Classroom Percent Mobility 

T 2ijk = Classroom Percent Overcrowdedness 

T 3ijk = Classroom Average Family Income 

T 4ijk = Classroom Average Family Education 

T 5ijk = Classroom Average Family Poverty Index 

T 6ijk = Classroom Percentage on Free or Reduced Lunch 

T 7ijk = Classroom Percentage Minority 

Tg ijk = Classroom Percentage Black 

Tgijlc = Classroom Percentage Hispanic 

T|oij k = Classroom Percentage Limited English Proficient 

Tpjj k = indicates the variable p for I th student from classroom j within school k, for 
i = 1,2, ...,I Jk ,j= 1, 2, ..., J h and *=1,2, ..., K. 

HLM-3 (13.01 
Stage 1 : 

Yjj k = A 0 + A j X j jj k + A 2 X 2i j k + A 3 X 3i j k + A 4 X 4i j k + A 5 X 5i j k + A, i(X n j k X 4i j k ) + A| 2 (X 2i j k X 4i j k ) + 

A|3(X 3 jj k X 4 jj k ) + -^I40^lij k ^5ij k ) A|5(X 2 jj k X5jj k ) -^I6(^3ij k ^5ij k ) -^I7(^4ij k ^5ij k ) 

A.8(X 1Uk X 4Uk X 5Uk ) + A| 9 (X 2 ij k X 4 ij k X 5 jj k ) + A 20 (X 3i j k X 4j j k X 5 ij k ) + 8jj k 
where ep ~ i.i.d. ~ N(0,a 2 ). 

Stage 2; 

r ij k _ Poj k + Plj k r iij k + P2j k r 2ij k + ^ijk 
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and 



Ppjk YpOk + Yplk^ijk + Yp2k^2jk + YpSk^jk + Yp4k^4jk + Yp5kTsj k + Yp6kT6jk + Yp7kT?jk + YpSk^jk + 

Yp9k^9jk Upjk 

for p = 0, 1 , 2. 

Ypqk — ®pqO ®pql^lk ®pq2^2k ••• Ppqk 

where E(5 iJk ) = 0, Var(5 iJk ) = a 2 , E(u pJk ) = 0, Var(Up Jk ) = t 3 x 3 , E(p pqk ) = 0, Var(p pqk ) = 2 and 

^ijk Upjk -L Ppqk’ 

School Effectiveness Indices were obtained from the empirical bayes residual for p 00k for school 
k Pook- 

Teacher Effectiveness Indices for teacher j within school k were obtained from the EB residual 
foruojk, u 0Jk . 

Schoolwide TEIs for teacher j are obtained by combining P 0 ok + ^ojk • 

HLM - 3C (13.1) 

Stage 1 ; 

NONE. 

Stage 2; 

^ijk - Pojk + Pljk r iijk + P2jk r 2ijk + $ijk 

Ppjk = YpOk + Yplk T ljk + Yp2k T 2jk + Yp3k T 3jk + Yp4k T 4jk + Yp5k T 5jk + Yp6k T 6jk + Yp7k T 7jk + Yp8k T 8jk + 

Yp9^9jk + Upjk 

for p=0, 1,2. 

Ypqk — tX pq o + dp q l Wj k + Ct pq 2 W 2 k + ... + P pqk 

where E(8 ijk ) = 0, Var(5 ijk ) = a 2 , E(Up jk ) = 0, Var(u pjk ) = x 3 x 3 . E(p pqk ) = 0, Var(p pqk ) = 2 and 

^ijk -E U p j k -L Ppqk- 

Figure 2 summarizes the various teacher level models. 
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RESULTS 



School Effect 

The major objective of this study was to determine an acceptable methodology for 
identifying how effective various schools and teachers were in addressing the major objectives of 
schooling. At the school level effect was defined as the difference between a group of students’ 
performance in a particular school and the performance that would have been expected if those 
students had attended a school with similar context but with practice of average effectiveness. 
Schools were defined as being above average in effectiveness, at average in effectiveness, or 
below average in effectiveness. Because the methodology was designed to define effective 
schools by controlling for factors over which the schools had no control and then to determine 
which schools made the greatest improvement, the degree of consistency among the results 
produced by the various least-squares regressions and HLM models was of major interest. While 
it is obvious that different context and conditioning variables produce different results, it was 
hypothesized that carefully thought out statistical models utilizing the same context and/or 
conditioning variables would produce very similar results. Specifically, we were interested in 
the consistency of results between least-squares regression models that rely on interactions to 
insure fairness and two-level HLM models that use similar context variables but add 
conditioning variables at the school level. 

Tables 1 and 2 show the correlations between the various models and methods that are 
specified by the aforementioned school level equations and summarized in Figure 1. The 
correlations between the results produced by DALLAS-FULL and HLM-FULL, two comparable 
models, were .9774 in reading and .9633 in mathematics. Similarly, the correlations between the 
results produced by DALLAS-MC and HLM-MC were .9701 in reading and .9212 in 
mathematics. Correlations between DALLAS-MCL and HLM-MCL were .9695 and .9119 in 
reading and mathematics, respectively. Thus the results produced by directly comparable 
ordinary least squares (OLS) models and HLM models were virtually identical with over 90% of 
the variance being accounted for. As the models become increasingly different, the correlations 
drop slightly although all of the correlations in reading are above .91 and all in mathematics are 
above .86. It seems obvious that the two approaches, one regression-based using first and second 
order interactions, and one two-level HLM using bayesian adjustments to school level regression 
lines, produced very similar results. It is also obvious that the eleven different models using 
OLS or HLM methods and slightly different variables produced very consistent results. 
Consistency of results is very important since this addresses the reliability of different models for 
ranking schools We would have liked to enter all of the context variables and their interactions 
into the first level of HLM to determine if there were differential effects of contextual variables 
within schools, but with anything but the very simplest of models we couldn’t invert the 
matrices. Thus we were left with the choice of using a contextually rich model that included 
most of the variables that are significantly related to student achievement, or a more simple 
model that excluded many of those variables. We chose the approach of having a first regression 
and prediction stage and computing all of the OLS and HLM models on residuals produced in 
that first stage. 
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Remembering that the major objective of this methodology was to assure fairness in 
comparing the products of schools, one important consideration was that of whether or not 
individual student background characteristics were related to results. Table 3 presents the 
correlations between results and various student characteristics. Perusal of Table 3 shows that 
these correlations were both practically and statistically insignificant. The only student 
characteristic that was significantly related to outcomes was posttest score, a situation that was 
expected and desired. 

The next important concern was whether or not school level contextual characteristics 
were related to the results produced by the various models. We know from previous research 
that school level contextual characteristics such as percentage of low socioeconomic students 
often correlate with results when individual level contextual characteristics do not (Webster and 
Olson, 1988). That is, it is often more difficult to move a low socioeconomic student immersed 
in a school of low socioeconomic students than it is to move a low socioeconomic student 
enrolled in a school with a number of higher socioeconomic students. Tables 4 and 5 show these 
correlations in reading and mathematics, respectively. These correlations are neither statistically 
nor practically significant, meaning that there was no relationship between the results produced 
by any of these models and the school level variables that were examined. Note that when 
conditioning variables were introduced at the second level in HLM the correlations with those 
context variables were adjusted to 0. The student and school level results meant that schools 
derived no particular advantage from starting with minority or white students, high or low 
socioeconomic level students, limited or non-limited English proficient students, a high or low 
mobile student body, or overcrowding or underutilized facilities. 

Table 6 displays the correlations of the results provided by the various models and 
predictor variables (reading residuals 94 and math residuals 94), criterion variables (reading 
residuals 95 and math residuals 95), and predicted scores. All correlations with predictors were 
zero, with criteria were significant as expected, and with predicted scores were slightly above 
zero but statistically and practically non-significant. This means that whether or not a particular 
student was below, at, or above prediction was not related to the level of the pretest score and 
therefore that schools derived no particular advantage by starting with high-scoring or low- 
scoring students. 

All things considered, it is important that both student level and school level contextual 
information be included in models for identifying effective schools. While it may be desirable to 
include this information in the first level of HLM, the authors were unable to enter sufficient 
numbers of background variables into the HLM models to reflect the complex nature of these 
inter-related variables. Rather than oversimplify the models to accommodate a small subset of 
important context variables within the confines of HLM, a preliminary regression stage was 
utilized to control for the effects of important context variables. This, in conjunction with a two- 
stage HLM model, produced minimal correlation between residuals and student level context 
variables and zero correlation between residuals and school level context variables. Specifically, 
HLM-MCC05 appears to be the model of choice for determining school effect. 
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Teacher Effect 



Tables 7 and 8 show the correlations between the various models and methods that are 
specified by the aforementioned teacher level equations and summarized in Figure 2. As can be 
seen from the data presented in Tables 7 and 8, HLM-MCC05 and HLM-MCC05 B.L.U.P. 
produced very different results from either the DALLAS-FULL, the STUDENT-TEACHER 
TWO LEVEL MODEL, or the THREE LEVEL STUDENT-TEACHER-SCHOOL MODEL (r < 
.75). The reason for this is rather straightforward. The HLM-MCC05 models are school level 
models with the initial student-level equations being calculated within schools. The conditioning 
variables are school-level conditioning variables that adjust schools’ slopes and intercepts for 
school characteristics. The empirical bayes residuals produced from these equations rank 
teachers within schools, not across the District. Since we are primarily interested in ranking 
teachers across the District, not within schools, the HLM-MCC05 models are not appropriate for 
this purpose. If, however, the school effect is added back into the empirical bayes residuals 
produced by the HLM-MCC05 equations, the results produced are much more in line with the 
two level student-teacher and three level student-teacher-school HLM models (r > .94). This 

model is labeled MCC Res+EBbO and includes the following adjustment to s it j . 

c _ r 95 _ f 95adj 

a itj 1 itj A itj 



^95adj p95 
r itj = r itj 



+ u oo 



* 

where u 00 



is the empirical bayes residual for school j. 



The question then becomes one of which model to use in parsimoniously estimating 
teacher effect. The advantages of DALLAS FULL, DALLAS FULL-B.L.U.P., HLM-MCC05, 
HLM-MCC05 B.L.U.P., AND MCC Res+EBbO are that the equations can be calculated at either 
the school or the district level, that the number of relevant predictor and conditioning variables is 
not limited by the methodology as is the case with the three level models, and that all students 
can be included in the calculations thus allowing most teachers to have indices. The two level 
student-teacher model and the three-level student-teacher-school model were used as the standard 
forjudging the other models since we believe they produce the best models of teacher effect. We 
would, however, prefer not to use these models in actual practice since, due to degrees of 
freedom issues involved with individual teachers, they effectively eliminated from consideration 
about 20% of teachers who should have had indices. 



One interesting factor in these deliberations is that there was very little within teacher 
variance in the student residuals. This suggest that school effect is really an aggregate teacher 
effect in that, within schools, there was relatively great between teacher variance in student 
residuals coupled with little within teacher variance (See Tables 9 and 10). When one examines 
correlations of results provided by the various models with important teacher level variables 
(Tables 1 1 and 12), all correlations except those with class size were statistically and practically 
non-significant. This means that the various models produced results that are free from bias 
relative to important classroom level contextual variables. (Class size was not entered into the 
equations.) 



Either of two models produced sufficiently consistent results to be used for estimating 
teacher effect. DALLAS FULL B.L.U.P. (Least Squares Regression with adjustment for 
shrinkage) produced results that correlated .9355 and .9140 with the two level student-teacher 
model and .9120 and .9128 with the three level student-teacher-school model in reading and 
mathematics, respectively. MCC Res+EBbO (Two-level student-school HLM with adjustment 
for shrinkage and with school effect added to the teacher level empirical bayes residuals) 
produced results that correlated .9684 and .9451 with the two level student-teacher model and 
.9754 and .9873 with the three level student-teacher-school model in reading and mathematics 
respectively. Thus we believe that the two level student-school HLM model with adjustment for 
shrinkage and with school effect added to the teacher level empirical bayes residual produced 
sufficiently consistent results with those produced by the student-teacher and student teacher- 
school HLM models to be used in the estimation of teacher effect. The resulting equations can 
use all available student data and produce indices for the majority of basic skills teachers. 

The efficacy of using MCC Res+EbbO for determining teacher effect is further supported 
by an examination of the amount of variance accounted for by each of the models. Table 13 
displays the R s for each of the models. When examining data, two R 2 s are important for each 
model. The first column for reading and mathematics displays the R 2 s from the first fairness 
stage. In the case of DALLAS-FULL the first stage accounted for 16.96% of the variance in 
reading. The second stage accounted for an additional 44.67% of the remaining 83.4% of the 
variance. Thus, between the first and second stage, DALLAS-FULL accounted for 70.75% of 
the variance in reading. Similar calculations yield the amount of variance accounted for by each 
of the models. When one examines HLM-MCC05, the base model for MCC Res+EBbO, one 
determines that one first needs to use the fairness equation from the first stage of DALLAS- 
FULL and HLM FULL, that is, add average parental income, mobility, and overcrowding back 
in at the student level. When this is done HLM-MCC05 accounts for over 70% of the variance in 
both reading and mathematics. This is very close to the variance accounted for by the two level 
student-teacher HLM equations and the three level student-teacher-school HLM equations. 

SUMMARY and DISCUSSION 

Several observations appear relevant based on this study. First, and perhaps most 
important, OLS analysis including first and second order interactions and two-level HLM 
analysis produced very similar results at both the school (reading, r = .9774; mathematics, 
r=.9633) and teacher (reading, r = .9530, mathematics, r = .9338) levels. At the teacher level, 
however, the two-level HLM model with adjustment for shrinkage and with school effect added 
back into the equations wets the model of choice since the results produced by that model 
correlated very highly with the results produced by two level student-teacher and three level 
student-teacher-school HLM models (reading, r = .9684 and .9754, respectively; mathematics, 
r= .9451 and .9873, respectively). 

Because the most prevalent method of rating schools is either on absolute test scores or 
on unadjusted gain scores, it is important to repeat the results of previous studies that 
demonstrated that such rating systems are biased against schools with higher than average 
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minority and poor student populations. There are no existing methodological fixes for this short 
of the use of appropriate statistical models. The fact that the average educator does not 
comprehend OLS or HLM is no excuse for rating schools or teachers in a haphazard manner that 
is demonstrably wrong. 

Although a previous study (Webster, et.al., 1995) demonstrated that the use of two years 
of achievement data without contextual variables included in the equations produced results that 
were different from the results produced by the equations included in this study and that were 
biased against schools that contained higher than average numbers of Black and economically 
poor students, we are going to continue investigations in this area. These investigations will 
include adding a third and fourth year to the prediction and adding contextual variables. Since 
adding additional areas of matched test scores will significantly reduce the number of students 
eligible for the analysis, investigations into the use of Bayesian estimation to estimate missing 
data will also be carried out. 

Meanwhile, the models that will be used in Dallas for ranking schools and teachers are as 
follows. 



Y|y = Outcome variable of interest for each student i in school j. I is a measure for 
grade/subj ect/year. 

Xjjj = Black English Proficient Status (1 if black, 0 otherwise). 

X 2i j = Hispanic English Proficient Status (1 if Hispanic, 0 otherwise). 

X 3i j = Limited English Proficient Status (1 if LEP, 0 other). 

X 4i j = Gender (1 if male, 0 if female). 

X 5i j = Free or Reduced Lunch Status (1 if subsidized, 0 otherwise). 

X 6 y = School Mobility Rate (same for all i in each j). 

X 7i j = School Overcrowdedness (same for all / in each j). 

X 8i j = Block Average Family Income 

X 9i j = Block Average Family Education Level 

X 10i j = Block Average Family Poverty Level 

X ki j = indicates the variable k for t ,h student in school j for i = 1 , 2, ..., Ij and j — 1,2, 

Student Level Variables; 

95 = Posttest Residual Score from fairness stage for measure / for z th student in 

'j school j. In this paper it represents ITBS Reading 1 995 or ITBS Mathematics 

1995. 

r^j ~ /? th predictor used to estimate r? 5 for z th student in school j. This is a Pretest 

Residual score from the fairness stage. In this paper it represents ITBS 
Reading 1994 and ITBS Mathematics 1994. 

r i'j - Y|jj —Yiy from OLE 




% 



School Lev el Variables: 



W]j = School Mobility 

W 2 j = School Overcrowdedness 

W 3 j = School Average Family Income 

W 4 j = School Average Family Education 

W 5 j = School Average Family Poverty Index 

W 6 j = School Percentage on Free or Reduced Lunch 

W 7 j = School Percentage Minority 

W g j = School Percentage Black 

W 9 j = School Percentage Hispanic 

W 10 j = School Percentage Limited English Proficient 

School Ranking s 

Stage l ; 

Y|j — Aq + A ] X ] ij + A 2 X 2 ij + A 3 X 3 jj + A4X4 jj + A5X5ij + A 6 X 6 jj + A 7 X 7 jj + AgXgy + AyXyy + 
■^10-^lOij ^1 l(-^lij-X 4 ij) ^12(^2ipQij) ^ 13 (^ 3 ij^ 4 ij) ^is(^2ij^5ij) ~*~ 

^ 16 (^ 3 ij^ 5 ij) + ^17(X4ijX 5 i j) + A lg (Xi ijX4jjX 5i j) + ^ 19 (^ 2 ij^ 4 ij^ 5 ij) -^-20 (-^3 ijX 4 ij X 5 jj ) + £y 

where Ey ~ i.i.d. ~ N(0,a 2 ). 




v 94 v 95 

x i > x \ 



= Student’s scores in 93/94 and 94/95 respectively, for math and reading. 



Stage 2: 



if =Po + P,r^+p 2 r^+8 u 

Pkj = Yko + YkiW,j + Yk 2 W 2j + Y k3 W 3j + Yk 4 W 4j + YksW 5 j + Yk6 W 6j + Yk7 W 7j + Yk8 W 8j + Yk9 W 9j + 
YkioW, 0j + u kj 

for i = 1,2, ...» Ij 
j = 1,2, ...» J 
k = 0, 1,2. 

where E(8jj) = 0, Var(8jj) = a 2 , E(u kj ) = 0, Var(u kj ) = ct 2 , and 8jj_Lu k j. 
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The school rankings are obtained from the empirical bayes residual for p 00 , which is Uq 0 , where 

* * Sq 
uoo=Poo-(Yoo + ZYos W sj) 

S= 1 

Poo = 0 — ^o)Y oo 

_ Var(Poo) 

^0 x-OSx ■ 

Var(r. 0 ) 

Teacher rankings 
Stage 1 : 

Yy = A 0 + A | X ! j j + A 2 X 2 jj + A 3 X3jj + A 4 X 4i j + A 5 X 5 jj + A 6 X 6i j + A 7 X 7 ij + A 8 X 8i j + A 9 X 9i j + 
AioXiojj + A 11 (X li jX 4j j) + A 12 (X 2 jjX 4i j) + A 13 (X3ijX 4i j) + Ai 4 (XijjX 5i j) + Ai 5 (X 2 ijX 5 ij) + 
Ai6(X 3 ijX 5i j) + A 17 (X 4i jX 5 jj) + A 18 (X li jX 4 jjX 5 ij) + A 19 (X 2i jX 4 ijX 5 ij) + A 20 (X 3 ijX 4 ijX 5i j) + Ejj 

where ~ i.i.d. ~ N(0,a 2 ). 

Stage 2; 



if - Po + P.rf +P 2 r 2 f + 5^ 

Pkj = YkO + Ykl W !j + Yk2 W 2j + Yk3 W 3j + Yk4 W 4j + Yk5 W 5j + Yk6 W 6j + Yk7 W 7j + Yk8 W 8j + Yk9 W 9j + 
YkioW 10 j + u kj 

for i = 1, 2, Ij 

j “ 1, 2 J 

k = 0, 1,2. 

where E(8jj) = 0, Var(8jj) = ct 2 , E(u k j) = 0, Var(u k j) = ct 2 , and 8jj-Lu k j. 



q = r 95 - f 95ad -» 
a itj 1 itj 1 itj 



-k 

tj 






J T K, 

I ZIs, 

j=l t=l k=l 

J T j 

I ZK tj 

j=l t=l 




30 



29 



J Tj K,j 

i ii(s t 5-n) 2 



ct 2 _ j=l t=l k=l 



J T j 

ZZK t 

j=lt=l 



To calculate the B.L.U.P. of s t - for the / th teacher in school j, 



Let 



K .j 



2 X 



s .j = 



k=l 




is the error variance for TEI for teacher t in school j, 
then 



TEI t j = n + (s,j - (j.) 



\ 



/ 



_2 ^ 

CT 2 + — 



Vv 



"tj yy 
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